A Person-Name Filter for Automatic Compilation of Bilingual Person-Name Lexicons
نویسندگان
چکیده
This paper proposes a simple and fast person-name filter, which plays an important role in automatic compilation of a large bilingual person-name lexicon. This filter is based on pn score, which is the sum of two component scores, the score of the first name and that of the last name. Each score is calculated from two term sets: one is a dense set in which most of the members are person names; another is a baseline set that contains less person names. The pn score takes one of five values, {+2, +1, 0, −1, −2 }, which correspond to strong positive, positive, undecidable, negative, and strong negative, respectively. This pn score can be easily extended to bilingual pn score that takes one of nine values, by summing scores of two languages. Experimental results show that our method works well for monolingual person names in English and Japanese; the F-score of each language is 0.929 and 0.939, respectively. The performance of the bilingual person-name filter is better; the F-score is 0.955.
منابع مشابه
Fast and easy development of pronunciation lexicons for names
We show that a good approach for the grapheme-to-phoneme conversion of Dutch proper names (e.g. person names, toponyms, etc), is to use a cascade of a general purpose grapheme-to-phoneme (G2P) converter and a special purpose phoneme-to-phoneme (P2P) converter. The G2P produces an initial transcription that is then transformed by the P2P. The P2P is automatically trained on reference transcripti...
متن کاملMultilingual person name recognition and transliteration
We present a tool that extracts person names from multilingual news collections and matches name variants referring to the same person. A novel feature is the matching of name variants across languages and writing systems, including names written with the Greek, Cyrillic and Arabic writing system. Due to our highly multilingual setting, we use an internal standard representation for name repres...
متن کاملAutomatic transcription error recovery for Person Name Recognition
Person Name Recognition from transcriptions of TV shows spoken content is a crucial step towards multimedia document indexing. Recognizing Person Names implies the combination of three main modules: Automatic Speech Recognition, NamedEntity Recognition and Entity Linking to associate the recognized surface form to a normalized Person Name. The three modules are potentially error prone. Hence, b...
متن کاملSpeaker Naming System by Associating Speech and Speaker Recognition Results
In this paper, we propose a system which can associate person names to individual speaker section. For this purpose, the automatic speaker segmentation is carried out utilizing online speaker modeling and speaker verification techniques. Key phrases and person names are also extracted by speech recognition. After this speaker segmentation and speech recognition, the person name is associated to...
متن کاملPerson Name Identification in Chinese Documents Using Finite State Automata
This research is about automatic identification and extraction of person names in Chinese text documents. Solutions to this problem have immediate and extensive applications in many areas especially in Web Intelligent Agents related applications such as Web search engines, Web data mining, and automatic Web information analysis. We have noted that while finite state automata (FSA) based techniq...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010